-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocksdb: use max_manifest_file_size option #25341
rocksdb: use max_manifest_file_size option #25341
Conversation
Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None
Suggestions on how to validate this change are welcome. I can manually verify that the setting works as advertised by using a smaller bound and reducing the sstable size. What other validations/regression tests should be done? |
Wow, this seems like a really bad default on the rocksdb side. Why would this default to a single manifest file of unbounded size? Review status: 0 of 1 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
As for validation, your idea of making this configurable (at least within a test) and verifying that the manifest does not exceed this size seems reasonable. Review status: 0 of 1 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
@petermattis Using a small memtable and sstable size (4 K), and a small value for the max manifest file size (4 K), I have verified that the setting works as advertised. I could see the MANIFEST file rolling after it reached 4 K. I also did a manifest_dump of the rolled manifest file, and verified that it starts with the previous snapshot (represented as a "AddedFiles" edit). A 4 K manifest contained about 40 entries. This means that a 128 MB manifest will allow for about 1.2 million file operations before rolling. This could be several TB of data churn with cockroach's default sstable size. I'm good with merging this. |
Shall I post a cherry-pick on 2.0 as well? |
bors r+ |
25341: rocksdb: use max_manifest_file_size option r=petermattis a=garvitjuniwal Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None Co-authored-by: Garvit Juniwal <[email protected]>
@garvitjuniwal Yes, let's cherry-pick this into 2.0. FYI, https://github.com/benesch/backport eases the backport process. You'd end up running something like |
Build succeeded |
25471: backport-2.0: rocksdb: use max_manifest_file_size option r=bdarnell a=tschottdorf Backport 1/1 commits from #25341. /cc @cockroachdb/release --- Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None 25474: backport-2.0: storage: fix deadlock in consistency queue r=bdarnell a=tschottdorf Backport 1/1 commits from #25456. /cc @cockroachdb/release --- When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ```go package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches #21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes. Co-authored-by: Garvit Juniwal <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.
The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.
Release note: None